**Computer Architecture Lab 5**

**Name: Eddy Kimani**

**Reg No.: SCT212-0596/2021**

**E1: Cache**

1. **Baseline Implementation (Direct-mapped cache, 16KB, 64B line)**

**Cold Misses**

Each cache line holds 16 elements (64B / 4B)

4096 elements / 16 = 256 lines per array

Total cold misses: 2 x 256 = 512 (for X and Y)

**Conflict Misses**

X[i] is evicted by Y[i] due to same index mapping

All stores to X[i]: 4096 misses

For every line of 16 Y[i], 15 are conflict misses: (15/16) x 4096 = 3840

Total conflict misses: 4096 + 3840 = 7936

**Total Misses**

512 (cold) + 7936 (conflict) = 8448

**Miss Rate**

8448 / 12288 68.75% (3 memory accesses per iteration x 4096 iterations)

1. **Software Optimizations**

**Option 1: Interleave X and Y (Structure of Arrays – Array of Structures)**

Each cache line holds 8 X-Y pairs (8 x 8B = 64B)

Cold misses: 4096 / 8 = 512

Conflict misses: 0

Total misses: 512

Miss rate: 512 / 12288 4.17%

**Option 2: Pad Memory to Avoid Aliasing**

Offset Y in memory to prevent mapping to same cache sets as X

Cold Misses: 2 x (4096 / 16) = 512

Conflict Misses: 0

Total Misses: 512

Miss rate: 512 / 12288 4.17%

1. Hardware Optimizations

**Option 1: Double cache size (32KB)**

Cold Misses: 512

Conflict Misses: 0

Miss Rate: 512 / 12288 4.17%

**Option 2: Make cache set-associative**

Cold Misses: 512

Conflict Misses: 0

Miss Rate: 512 / 12288 4.17%

**Option 3: Increase Block size by z**

Cold misses: 4096 / (16 x z)

Conflict misses: 2 x 4096 = 8192 (Y evicts X and vice versa)

Larger block size reduces cold misses but increases conflict

**Option 4: Add next-line prefetcher**

Cold misses: 2 x (4096 / 32) = 256

Conflict misses: 4096 + (31/32) x 4096 = 8064

Miss rate: (256 + 8064) / 12288 66.7%

**Option 5: Add victim cache**

Cold misses: 256 (from X)

Conflict misses: 4096 (Y evicts X, but stores to X hit in victim cache)

Total misses: 4352

Miss rate: 4352 / 12288 35.42%